Main goals of the
session
To understand (1) how genetic variation is estimated and (2) how does random mating configure allele and genotype frequencies
To create R scripts from scratch to solve the previous objectives
In this practical, we are going to use the RStudio integrated development environment (IDE) for R. R is a programming language for statistical computing and graphics.
The current document in which we are working is an RMarkdown document. RMarkdown documents are fully reproducible and allow you to combine text, images and code –this time, R programming language.
To render a R Markdown document to a HTML file, you
just need to click the Knit button that you’ll see in the
RStudio bar. This HTML file can be shared as a report.
You don’t need to render the whole document every time you want to
see the result of your R code. You can click in the
Run current chunk button or use the keyboard shortcut
Ctrl+Alt+C and the result of the
code will appear below it.
You will see different icons through the document, the meaning of which is:
: additional or useful
information
: a worked example
: a practical exercise
: a space to answer the exercise
: a hint to solve an exercise
: a more challenging exercise
Scripts which are going to be created in this session:
Estimation of genotype and allele frequencies from a genotyping survey
Compute the effect of random mating in a Mendelian population (Hardy-Weinberg equilibrium)
HWE χ2 test on counts from a genotyping survey
Estimation of allele and genotype frequencies from count of dominant and recessive phenotypes assuming HWE
The R scripts that we are going to create will consist on R functions that will automatize the calculation of population genetics statistics, such as the genotype and allele frequencies, or testing HWE.
An R function is created by using the keyword
function(). The basic syntax of an R function definition
is:
function_name <- function(arg_1, arg_2, ...) {
# Function body code
}
The different parts of a function are:
We have the following genotype information:
| Genotypes | MM | MN | NN |
|---|---|---|---|
| Number of individuals | 1787 | 3037 | 1305 |
We are going to create an R script to estimate genotype and allele frequencies. We will create a function that takes the information of the three genotypes and outputs the genotype and allele frequencies.
# ESTIMATION OF GENOTYPE AND ALLELE FREQUENCIES FROM A GENOTYPING SURVEY
# One-gene two-alleles: X1 and X2
# Three genotypes: X11, X12 and X22
# Sample size genotype ij -> Nij
Allele frequency can be calculated from the allele count:
\[p = f(A) = \frac{N_A}{2N}\]
\[q = f(a) = \frac{N_a}{2N}\]
Allele frequency can also be calculated from the genotype count:
\[p = f(A) = \frac{N_{AA}+\frac{1}{2}N_{Aa}}{N} = \frac{1787 + \frac{1}{2}3037}{6129} = 0.539 \]
\[q = f(a) = \frac{N_{aa}+\frac{1}{2}N_{Aa}}{N} = \frac{1305 + \frac{1}{2}3037}{6129} = 0.461 \]
Genotype frequency is estimated from the genotype count:
\[f(AA) = \frac{N_{AA}}{N} = \frac{1787}{6129} = 0.292\]
\[f(Aa) = \frac{N_{Aa}}{N} = \frac{3037}{6129} = 0.496 \]
\[f(aa) = \frac{N_{aa}}{N} = \frac{1305}{6129} = 0.213 \]
Once we have clear the structure of a function, we can write the main body code that will perform the operations to calculate the allele frequencies and the genotype frequencies:
# ESTIMATION OF GENOTYPE AND ALLELE FREQUENCIES FROM A GENOTYPING SURVEY
# One-gene two-alleles: X1 and X2
# Three genotypes: X11, X12 and X22
# Sample size genotype ij -> Nij
# Create the function
gene_freq <- function(N11, N12, N22) {
N = N11 + N12 + N22 # N total sample
# Allele frequencies estimation
p = (N11 + N12/2)/N # p is the frequency of X1
q = 1 - p # q is the frequency of X2
# Genotype frequencies estimation
f11 = N11/N # frequency of X11 genotype
f12 = N12/N # frequency of X12 genotype
f22 = N22/N # frequency of X22 genotype
# Output
return (list("N" = N, "N11" = N11, "N12" = N12, "N22" = N22,
"p" = p, "q" = q,
"f11" = f11, "f12" = f12, "f22" = f22))
}
Now, we are going to invoke the function with our genotype data:
# Invoke the function
results = gene_freq(1787, 3037, 1305)
# Print results
print(paste0("Sample size = ",results$N," -> N11 = ",results$N11,", N12 = ", results$N12," and N22 = ",results$N22))
## [1] "Sample size = 6129 -> N11 = 1787, N12 = 3037 and N22 = 1305"
print(paste0("p = ", round(results$p,3)," and q = ",round(results$q,3)))
## [1] "p = 0.539 and q = 0.461"
print(paste0("f11 = ",round(results$f11,3), ", f12 = ",round(results$f12,3)," and f22 = ", round(results$f22,3)))
## [1] "f11 = 0.292, f12 = 0.496 and f22 = 0.213"
Random mating describes an ideal situation in which all individuals of one sex are equally potential partners of all members of the opposite sex. Random mating is also referred as panmixia.
Random mating is one of the requirements for the Hardy–Weinberg law to hold.
The genotypic frequencies in the initial generation are:
\[P = f(AA)\] \[Q = f(Aa)\] \[R = f(aa)\]
Table with all possible crosses:
| Cross | Cross frequency | AA | Aa | aa |
|---|---|---|---|---|
| AA\(\times\)AA | P2 | 1 | 0 | 0 |
| AA\(\times\)Aa | 2PQ | ½ | ½ | 0 |
| AA\(\times\)aa | 2PR | 0 | 1 | 0 |
| Aa\(\times\)Aa | Q2 | ¼ | ½ | ¼ |
| Aa\(\times\)aa | 2QR | 0 | ½ | ½ |
| aa\(\times\)aa | R2 | 0 | 0 | 1 |
The next generation genotypic frequencies are:
\[P' = f'(AA)\] \[Q' = f'(Aa)\] \[R' = f'(aa)\]
And can be calculated as:
\[P' = P^2 + \frac{2PQ}{2} + \frac{Q^2}{4} = (P + \frac{Q}{2})^2 = p^2\] \[Q' = PQ + 2PR + \frac{Q^2}{2} + QR = 2pq\]
\[R' = R^2 + \frac{2QR}{2} + \frac{Q^2}{4} = (R + \frac{Q}{2})^2 = q^2\]
We are going to create a function that calculate the mating pairs (frequencies probabilities), the genotype probabilities of the progeny and the allele probabilities of progeny from the initial genotype frequencies (f11, f12 and f22).
# MENDELIAN POPULATION
# RANDOM MATING - HWE
# One-gene two-alleles: X1 and X2
# Three initial genotypes: X11, X12 and X22
# Initial genotype frequencies: f11, f12, f22
# Create the function
random_mating <- function(f11, f12, f22) {
p = f11 + f12/2 # allele frequency of X1
q = 1 - p # allele frequency of X2
# Mating pairs (frequencies) probabilities
f11_11 = f11 * f11
f11_12 = 2 * f11 * f12
f11_22 = 2 * f11 * f22
f12_12 = f12 * f12
f12_22 = 2 * f12 * f22
f22_22 = f22 * f22
# Genotype probabilities of progeny (offspring) f11_, f12_, f22_
f11_ = f11_11 + f11_12/2 + f12_12/4
f12_ = f11_12/2 + f11_22 + f12_12/2 + f12_22/2
f22_ = f12_12/4 + f12_22/2 + f22_22
# Allele probability of progeny p_ and q_
p_ = f11_ + f12_/2
q_ = 1 - p_
# Output
return (list("p" = p, "q" = q,
"f11" = f11, "f12" = f12, "f22" = f22,
"p_" = p_, "q_" = q_,
"f11_" = f11_, "f12_" = f12_, "f22_" = f22_))
}
Now, we are going to invoke the function with the initial genotypes frequencies P = 0.5, Q = 0, R = 0.5:
# Invoke the function
result <- random_mating(0.5, 0, 0.5)
# Print results
print(paste0("Allele prob. initial generation (p and q) ",result$p, result$q))
## [1] "Allele prob. initial generation (p and q) 0.50.5"
print(paste("Genotype prob. initial generation (f11, f12, f22) ",result$f11, result$f12,result$f22))
## [1] "Genotype prob. initial generation (f11, f12, f22) 0.5 0 0.5"
print(paste("Allele prob. next generation (p' and q') ",result$p_, result$q_))
## [1] "Allele prob. next generation (p' and q') 0.5 0.5"
print(paste("Genotype prob. next generation (f'11, f'12, f'22) ",result$f11_, result$f12_,result$f22_))
## [1] "Genotype prob. next generation (f'11, f'12, f'22) 0.25 0.5 0.25"
print(paste("Genotype prob. next generation according HWE ", result$p_ * result$p_, 2*result$p_ * result$q_,result$q_ * result$q_))
## [1] "Genotype prob. next generation according HWE 0.25 0.5 0.25"
Two populations start with the following genotypic frequencies:
Use the previous script (random_mating) to calculate the
genotypic frequencies of the next generation (P’1, Q’1, R’1 and P’2,
Q’2, R’2) if random mating exist.
# initial genotypic frequencies population 1
P1 <- 0.24
Q1 <- 0.32
R1 <- 0.44
# initial genotypic frequencies population 2
P2 <- 0.33
Q2 <- 0.14
R2 <- 0.53
next_1 <- random_mating(P1, Q1, R1)
print(paste("Genotype prob. next generation (P’1, Q’1, R’1) ",next_1$f11_, next_1$f12_,next_1$f22_))
## [1] "Genotype prob. next generation (P’1, Q’1, R’1) 0.16 0.48 0.36"
print(paste("Genotype prob. next generation according HWE ", next_1$p_ * next_1$p_, 2*next_1$p_ * next_1$q_,next_1$q_ * next_1$q_))
## [1] "Genotype prob. next generation according HWE 0.16 0.48 0.36"
next_2 <- random_mating(P2, Q2, R2)
print(paste("Genotype prob. next generation (P’2, Q’2, R’2) ",next_2$f11_, next_2$f12_,next_2$f22_))
## [1] "Genotype prob. next generation (P’2, Q’2, R’2) 0.16 0.48 0.36"
print(paste("Genotype prob. next generation according HWE ", next_2$p_ * next_2$p_, 2*next_2$p_ * next_2$q_,next_2$q_ * next_2$q_))
## [1] "Genotype prob. next generation according HWE 0.16 0.48 0.36"
Modify the previous script to compute the impact of assortative mating on the genotype and allele frequencies. Consider the following scenarios:
Hint: Consider that individuals AA and Aa have the same phenotype, while individuals aa have a different one (A is dominant over a).
Hint: To take into account assortative mating, change the mating pairs probabilities. Here you have solved the first scenario as an example:
# Mating pairs (frequencies) probabilities
ProbAllMatings= f11*f11 + f12*f12 + f22*f22
f11_11 = f11*f11/ProbAllMatings
f11_12 = 0
f11_22 = 0
f12_12 = f12*f12/ProbAllMatings
f12_22 = 0
f22_22 = f22*f22/ProbAllMatings
# Complete assortative positive mating among genotypes
a1 <- function(f11, f12, f22) {
p = f11 + f12/2 # allele frequency of X1
q = 1 - p # allele frequency of X2
# Mating pairs (frequencies) probabilities
ProbAllMatings= f11*f11 + f12*f12 + f22*f22
f11_11 = f11*f11/ProbAllMatings
f11_12 = 0
f11_22 = 0
f12_12 = f12*f12/ProbAllMatings
f12_22 = 0
f22_22 = f22*f22/ProbAllMatings
# Genotype probabilities of progeny (offspring) f11_, f12_, f22_
f11_ = f11_11 + f11_12/2 + f12_12/4
f12_ = f11_12/2 + f11_22 + f12_12/2 + f12_22/2
f22_ = f12_12/4 + f12_22/2 + f22_22
# Allele probability of progeny p_ and q_
p_ = f11_ + f12_/2
q_ = 1 - p_
# Output
return (list("p" = p, "q" = q,
"f11" = f11, "f12" = f12, "f22" = f22,
"p_" = p_, "q_" = q_,
"f11_" = f11_, "f12_" = f12_, "f22_" = f22_))
}
# Complete assortative positive mating among phenotypes
a2 <- function(f11, f12, f22) {
p = f11 + f12/2 # allele frequency of X1
q = 1 - p # allele frequency of X2
# Mating pairs (frequencies) probabilities
ProbAllMatings= f11*f11 + 2*f11*f12 + f12*f12 + f22*f22
f11_11 = f11*f11/ProbAllMatings
f11_12 = 2*f11*f12/ProbAllMatings
f11_22 = 0
f12_12 = f12*f12/ProbAllMatings
f12_22 = 0
f22_22 = f22*f22/ProbAllMatings
# Genotype probabilities of progeny (offspring) f11_, f12_, f22_
f11_ = f11_11 + f11_12/2 + f12_12/4
f12_ = f11_12/2 + f11_22 + f12_12/2 + f12_22/2
f22_ = f12_12/4 + f12_22/2 + f22_22
# Allele probability of progeny p_ and q_
p_ = f11_ + f12_/2
q_ = 1 - p_
# Output
return (list("p" = p, "q" = q,
"f11" = f11, "f12" = f12, "f22" = f22,
"p_" = p_, "q_" = q_,
"f11_" = f11_, "f12_" = f12_, "f22_" = f22_))
}
# Complete assortative negative mating among genotypes
a3 <- function(f11, f12, f22) {
p = f11 + f12/2 # allele frequency of X1
q = 1 - p # allele frequency of X2
# Mating pairs (frequencies) probabilities
ProbAllMatings= 2*f11*f12 + 2*f11*f22 + 2*f12*f22
f11_11 = 0
f11_12 = 2*f11*f12/ProbAllMatings
f11_22 = 2*f11*f22/ProbAllMatings
f12_12 = 0
f12_22 = 2*f12*f22/ProbAllMatings
f22_22 = 0
# Genotype probabilities of progeny (offspring) f11_, f12_, f22_
f11_ = f11_11 + f11_12/2 + f12_12/4
f12_ = f11_12/2 + f11_22 + f12_12/2 + f12_22/2
f22_ = f12_12/4 + f12_22/2 + f22_22
# Allele probability of progeny p_ and q_
p_ = f11_ + f12_/2
q_ = 1 - p_
# Output
return (list("p" = p, "q" = q,
"f11" = f11, "f12" = f12, "f22" = f22,
"p_" = p_, "q_" = q_,
"f11_" = f11_, "f12_" = f12_, "f22_" = f22_))
}
# Complete assortative negative mating among phenotypes
a4 <- function(f11, f12, f22) {
p = f11 + f12/2 # allele frequency of X1
q = 1 - p # allele frequency of X2
# Mating pairs (frequencies) probabilities
ProbAllMatings= 2*f11*f22 + 2*f12*f22
f11_11 = 0
f11_12 = 0
f11_22 = 2*f11*f22/ProbAllMatings
f12_12 = 0
f12_22 = 2*f12*f22/ProbAllMatings
f22_22 = 0
# Genotype probabilities of progeny (offspring) f11_, f12_, f22_
f11_ = f11_11 + f11_12/2 + f12_12/4
f12_ = f11_12/2 + f11_22 + f12_12/2 + f12_22/2
f22_ = f12_12/4 + f12_22/2 + f22_22
# Allele probability of progeny p_ and q_
p_ = f11_ + f12_/2
q_ = 1 - p_
# Output
return (list("p" = p, "q" = q,
"f11" = f11, "f12" = f12, "f22" = f22,
"p_" = p_, "q_" = q_,
"f11_" = f11_, "f12_" = f12_, "f22_" = f22_))
}
Deviations from Hardy-Weinberg equilibrium for a genetic variant can be calculated using the χ2 test, using the observed genotype frequencies obtained from the empirical data and the expected genotype frequencies obtained using the Hardy-Weinberg proportions (HWP). The null hypothesis is that the population is in HWP, and the alternative hypothesis is that the population is not in HWP.
| Genotype | MM | MN | NN | Total |
|---|---|---|---|---|
| Number of individuals | 1787 | 3037 | 1305 | 6129 |
| Number of alleles M | 3574 | 3037 | 0 | 6611 |
| Number of alleles N | 0 | 3037 | 2610 | 5647 |
| Number of alleles M+N | 3574 | 6074 | 2610 | 12258 |
First, we calculate the allele frequency of both alleles:
\[p = \frac{N_M}{2N} = \frac{6611}{12258} = 0.539\] \[p = \frac{N_N}{2N} = \frac{5647}{12258} = 0.461\]
Then the HWE expected frequency is estimated as:
\[E(AA) = p^2 \times N = 0.539^2 \times 6129 = 1782.7\]
\[E(Aa) = 2pq \times N = 2 \times 0.539 \times 0.461 \times 6129 = 33045.6\] \[E(aa) = q^2 \times N = 0.461^2 \times 6129 = 1300.7\]
The χ2 test formula is:
\[ \chi^2 = \sum{\frac{(Expected \; number - Observed \; number)^2}{Expected \; number}} = \] \[ \frac{(1782.7 – 1787)^2}{1782.7} + \frac{(33045.6 – 3037)^2}{33045} + \frac{(1300.7 – 1305)^2}{1300} = 0.048\]
We have to compare this value with the χ2 distribution to see if its significant. The probability of exceeding the critical χ2 with a 0.05 significance and 1 degree of freedom is χ20.05;1= 3.841. Because the calculated χ2 value is lower than the critical value, the observed genotypes are in HWE.
n R, the critical value can be obtained with the following function:
qchisq(0.95, df = 1)
## [1] 3.841459
The probability of your value (Chi_value = 0.048) can be
obtained with:
pchisq(0.048, df = 1, lower = F)
## [1] 0.8265807
# HWE CHI-SQUARE TEST ON COUNTS FROM A GENOTYPING SURVEY
# One-gene two-alleles: X1 and X2
# Three initial genotypes: X11, X12 and X22
# Sample size genotype ij -> Nij
# Create the function
HWE_test <- function(N11, N12, N22) {
N = N11 + N12 + N22 # N total sample
# Allele frequencies estimation
p = (N11 + N12/2)/N # p is the frequency of X1
q = 1 - p # q is the frequency of X2
#Genotype frequencies
f11 = N11/N # frequency of X11 genotype
f12 = N12/N # frequency of X12 genotype
f22 = N22/N # frequency of X22 genotype
# Expected HWE
E11 = N*p^2
E12 = N*2*p*q
E22 = N-E11-E12
# Chi-square value and probability
Chi_value = (N11-E11)^2/E11 + (N12-E12)^2/E12 + (N22-E22)^2/E22
prob = pchisq(Chi_value, df = 1, lower = F)
# Output
return (list("N" = N, "N11" = N11, "N12" = N12, "N22" = N22,
"p" = p, "q" = q,
"f11" = f11, "f12" = f12, "f22" = f22,
"Chi_value" = Chi_value, "prob" = prob))
}
Now, we are going to invoke the function with our genotype data:
# Invoke the function
results <- HWE_test(1787,3037,1305)
# Print results
print(paste("Sample size = ", results$N," -> N11 = ", results$N11,", N12 = ", results$N12,",N22 = ", results$N22))
## [1] "Sample size = 6129 -> N11 = 1787 , N12 = 3037 ,N22 = 1305"
print(paste("p = ",round(results$p,digits=3)," and q = ", round(results$q,digits=3) ))
## [1] "p = 0.539 and q = 0.461"
print(paste("f11 = ",round(results$f11,digits=3),", f12 = ",round(results$f12,digits=3)," and f22 = ", round(results$f22,digits=3)))
## [1] "f11 = 0.292 , f12 = 0.496 and f22 = 0.213"
print(paste("Chi-square value = ",round(results$Chi_value,3),"; Prob. = ", round(results$prob,3)))
## [1] "Chi-square value = 0.048 ; Prob. = 0.826"
361 Navajo individuals were analyzed in New Mexico for the MN locus: 305 MM types, 52 MN and 4 NN. Use the previous script to answer the next questions:
Note: M and N are codominant.
navajo <- HWE_test(305, 52, 4)
# Test for the deviation from HWE and its significance.
# We check if chi square value is grater or smaller than critical value
critic_value = qchisq(0.95, df = 1)
chisq_value = round(navajo$Chi_value,3)
# if chi square value is smaller we know that exist significant deviance from Hardy-Weinberg Equilibrium
if (chisq_value > critic_value){
print("There is a significant deviance from Hardy-Weinberg Equilibrium.") # reject H0
} else {
print("There is no significant deviance from Hardy-Weinberg Equilibrium.") # accept H0
}
## [1] "There is no significant deviance from Hardy-Weinberg Equilibrium."
# What proportion of children of women of phenotype N are expected to present the maternal phenotype?
print(paste("Proportion of children of women with phenotype N expected to have maternal phenotype (NN) = ", round(navajo$f22, 3)))
## [1] "Proportion of children of women with phenotype N expected to have maternal phenotype (NN) = 0.011"
# Proportion of children of heterozygous MN women expected to have maternal phenotype (NN or MN)
print(paste("Proportion of children of heterozygous MN women expected to have the maternal phenotype = ", round(navajo$f12 + navajo$f22, 3)))
## [1] "Proportion of children of heterozygous MN women expected to have the maternal phenotype = 0.155"
Three alleles A, B, anc C of the red cell acid phosphatase locus were found in a simple of 178 English individuals. The number of individuals found with six possible genotypes are given below.
| Genotype | AA | AB | AC | BB | BC | CC |
|---|---|---|---|---|---|---|
| Number | 17 | 86 | 5 | 61 | 9 | 0 |
Modify the previous script to do the following:
Hint: Modify the code to take into account three alleles (and six genotypes) instead of two alleles (and three genotypes)
# One-gene 3-alleles: X1, X2, X3
# Genotypes: X11, X12, X13, X22, X23, X33
# Sample size genotype ij -> Nij
# The input data are the counts of each genotype ordered
HWE_test_3 <- function(N11, N12, N13, N22, N23, N33) {
N = N11 + N12 + N13 + N22 + N23 + N33 # N will be the sample size
# Allele frequencies estimation
p = (N11 + N12/2 + N13/2)/N # frequency of allele N1
q = (N22 + N12/2 + N23/2)/N # frequency of allele N2
r = 1-p-q # frequency of allele N3
# We can divide to know the enotype frequencies
f11 = N11/N
f12 = N12/N
f13 = N13/N
f22 = N22/N
f23 = N23/N
f33 = N33/N
# Expected HWE
E11 = N*p^2
E12 = N*2*p*q
E13 = N*2*p*r
E22 = N*q^2
E23 = N*2*q*r
E33 = N*r^2
# Chi-square value and probability
Chi_val = (N11-E11)^2/E11 + (N12-E12)^2/E12 + (N13-E13)^2/E13 + (N22-E22)^2/E22 + (N23-E23)^2/E23 + (N33-E33)^2/E33
prob = pchisq(Chi_val, df = 2, lower = F)
# Output
return (list("N" = N, "N11" = N11, "N12" = N12, "N13" = N13, "N22" = N22, "N23" = N23, "N33" = N33,
"p" = p, "q" = q, "r" = r,
"f11" = f11, "f12" = f12, "f13" = f13, "f22" = f22, "f23" = f23, "f33" = f33,
"Chi_value" = Chi_val, "prob" = prob))
}
# Estimate the frequencies of alleles A, B, and C
hwe <- HWE_test_3(17, 86, 5, 61, 9, 0)
print(paste("Frequency of A:", round(hwe$p, 3)))
## [1] "Frequency of A: 0.351"
print(paste("Frequency of B:", round(hwe$q, 3)))
## [1] "Frequency of B: 0.61"
print(paste("Frequency of C:", round(hwe$r, 3)))
## [1] "Frequency of C: 0.039"
# Use a χ2 test to examine whether these data are consistent with HWE
critical_value = qchisq(0.95, df = 2)
chisq_value = round(hwe$Chi_value, 3)
if (chisq_value > critical_value){
print("The data not follow with Hardy-Weinberg Equilibrium.")
} else {
print("The data follow Hardy-Weinberg Equilibrium.")
}
## [1] "The data follow Hardy-Weinberg Equilibrium."
Consider an allele A which is totally dominate over a. That means that the dominant phenotype will show if either the homozygous AA or heterozygous Aa genotypes occur. This type of inheritance occurs with the Rh human blood types, for example. There are two alleles: D and d. Individuals who are homozygous dominant (DD) or heterozygous (Dd) are Rh+, and therefore, have Rh antigens). Those who are homozygous recessive (dd) are Rh- (they don’t have Rh antigens).
To calculate genotype frequencies from counts of dominant (or recessive) phenotypes, we have to consider the following. The recessive phenotype is controlled by the homozygous aa genotype, while the frequency of the dominant phenotype equals the sum of the frequencies of AA and Aa. Therefore, the recessive phenotype is simply the frequency of aa:
| Phenotype | Genotype | Number of individuals |
|---|---|---|
| Rh+ | AA or Aa | 170 |
| Rh- | aa | 30 |
From the phenotypic frequencies we know that:
\[f(AA) + f(Aa) = \frac{N{A\_}}{N} = \frac{170}{200} = 0.8 \]
\[ f(aa) = \frac{N{aa}}{N} = \frac{30}{200} = 0.15 \]
We use the frequency of the recessive genotype to estimate q:
\[ q = \sqrt{f(aa)} = \sqrt{0.15} = 0.387 \] \[ p = 1 - q = 0.613 \]
Then, the genotype frequencies can be estimated according to the HWE:
\[ f(AA) = p^2 = 0.613^2 = 0.375 \] \[ f(Aa) = 2pq = 2 \times 0.613 \times 0.387 = 0.475 \]
\[ f(aa) = q^2 = 0.387^2 = 0.15 \]
Some examples of Mendelian recessive phenotypes in humans:
# ESTIMATION OF ALLELE AND GENOTYPE FREQUENCIES FROM COUNTS OF DOMINANT AND
# RECESSIVE PHENOTYPES ASSUMING HWE
# One-gene two-alleles with dominance: X1 > X2
# Three genotypes: X11, X12 and X22
# Two phenotypes: N1_ and N22
# Sample size genotype ij -> Nij
# Create the function
freq_dominance <- function(NA_, Naa) {
N = NA_ + Naa # N total sample
# Phenotypic frequencies
faa = Naa/N # Frequency genotype aa
# Allele frequency
q = sqrt(faa)
p = 1 - q;
# Genotype frequencies according HWE
f11 = p^2
f12 = 2*p*q
f22 = q^2
# Output
return (list("N" = N, "NA_" = NA_, "Naa" = Naa,
"p" = p, "q" = q,
"f11" = f11, "f12" = f12, "f22" = f22))
}
Now, we are going to invoke the function with our genotype data:
# Invoke the function
results <- freq_dominance(170, 30)
# Print results
print(paste("Sample size = ", results$N," -> NA_ = ",results$NA_,",Naa = ", results$Naa))
## [1] "Sample size = 200 -> NA_ = 170 ,Naa = 30"
print(paste("p = ",round(results$p,digits=3)," and q = ", round(results$q,digits=3) ))
## [1] "p = 0.613 and q = 0.387"
print(paste("fAA = ",round(results$f11,digits=3),", fAa = ",
round(results$f12,digits=3)," and faa = ", round(results$f22,digits=3) ))
## [1] "fAA = 0.375 , fAa = 0.475 and faa = 0.15"
What is the frequency of heterozygotes Aa in a population with random mating in which the frequency of the dominant phenotype is 0.19?
results <- freq_dominance(0.19 * 200, 0.81 * 200)
# Print the frequency of heterozygotes Aa
print(paste("Frequency of heterozygotes Aa: ", round(results$f12, digits = 3)))
## [1] "Frequency of heterozygotes Aa: 0.18"
What is the frequency of heterozygotes Aa in a population with random mating if the frequency of the recessive phenotype aa is 0.09?
results <- freq_dominance(0.91 * 200, 0.09 * 200)
# Print the frequency of heterozygotes Aa
print(paste("Frequency of heterozygotes Aa: ", round(results$f12, digits = 3)))
## [1] "Frequency of heterozygotes Aa: 0.42"
1 in 1700 US Caucasian newborns have cystic fibrosis. C is the normal allele, dominant over the recessive c. Individuals must be homozygous for the recessive allele to have the disease. Calculate the following, using the previous script:
freq_cc <- 1/1700 # 1 in 1700 US Caucasian newborns have cystic fibrosis.
result <- freq_dominance(1-freq_cc, freq_cc)
print(paste("Percent of population with cystic fibrosis:", round(freq_cc*100, 3), "%"))
## [1] "Percent of population with cystic fibrosis: 0.059 %"
print(paste("Frequency of recessive allele in the population:", round(result$q, 5)))
## [1] "Frequency of recessive allele in the population: 0.02425"
print(paste("Frequency of dominant allele in the population:", round(result$p, 5)))
## [1] "Frequency of dominant allele in the population: 0.97575"
print(paste("Percentage of carriers (heterozygous individuals) in the population:", round(result$f12*100, 3), "%"))
## [1] "Percentage of carriers (heterozygous individuals) in the population: 4.733 %"
In the human population, one in every 10,000 babies is albino. Using the previous script, calculate the approximate frequency of the allele produced by albinism, knowing that it is autosomal and recessive. What proportion of non-albino individuals are carriers?
# frequency of albino phenotype (aa)
freq_albino <- 1/10000
result <- freq_dominance(1-freq_albino, freq_albino)
# The frequency of the recessive allele in the population.
print(paste("Frequency of recessive allele:", round(result$q, 5)))
## [1] "Frequency of recessive allele: 0.01"
# The percentage of heterozygous individuals (carriers) in the population
print(paste("Percentage of peopple non-albino individuals that are carriers:", round(result$f12*100, 3), "%"))
## [1] "Percentage of peopple non-albino individuals that are carriers: 1.98 %"
The HWE is a characteristic of a random mating population, if no external evolutionary forces are acting on it (for example, natural selection, migration or mutation). We are going to study what happens when the Mendelian law of equal segregation is not met.
Consider that the heterozygote “Aa” segregates in proportion 2(A):1(a), instead of an equal proportion (1)A:(1)a.
| Cross | Cross frequency | AA | Aa | aa |
|---|---|---|---|---|
| AA × AA | P² | |||
| AA × Aa | 2PQ | |||
| AA × aa | 2PR | |||
| Aa × Aa | Q² | |||
| Aa × aa | 2QR | |||
| aa × aa | R² |
\[ P' = ... \] \[ Q' = ... \] \[ R' = ... \]
Modify the random_mating() function with these new
probabilities and answer these questions:
What allele and genotype frequencies are expected after one
generation of random mating? Modify the random_mating()
function with the new crossing probabilities and the next-generation
genotypic frequencies that you calculated. Use the following initial
frequencies: f11=0.5, f12=0,
f22=0.5.
What allele and genotype frequencies are expected over 25 generations? For example, create a for loop where the output frequencies are the input frequencies of the next run:
for (i in 1:25) {
}
To better understand whats happening with the allele frequencies, we
are going to plot how the allele frequency of A changes over
generations. To do that, we are going to create an empty vector and we
will store the values of p to generate the plot.
p <- vector()
for (i in 1:25) {
# call the function
result <- random_mating ...
p[i] <- result$p
}
random_mating() function, that assumes that alleles equally
segregate, using the same initial frequencies: f11=0.5,
f12=0, f22=0.5. What pattern do you observe?
Is it the same as before?random_mating <- function(f11, f12, f22) {
p = f11 + f12/2 # allele frequency of X1
q = 1 - p # allele frequency of X2
# Mating pairs (frequencies) probabilities
f11_11 = f11*f11
f11_12 = 2*f11*f12
f11_22 = 2*f11*f22
f12_12 = f12*f12
f12_22 = 2*f12*f22
f22_22 = f22*f22
# Genotype probabilities of progeny (offspring) f11_, f12_, f22_
f11_ = f11_11 + f11_12/2 + f12_12/4
f12_ = f11_12/2 + f11_22 + f12_12/2 + f12_22/2
f22_ = f12_12/4 + f12_22/2 + f22_22
# Allele probability of progeny p_ and q_
p_ = f11_ + f12_/2
q_ = 1 - p_
# Output
return (list("p" = p, "q" = q,
"f11" = f11, "f12" = f12, "f22" = f22,
"p_" = p_, "q_" = q_,
"f11_" = f11_, "f12_" = f12_, "f22_" = f22_))
}
p <- vector()
generations <- 1:25
f11 <- 0.5
f12 <- 0
f22 <- 0.5
res <- random_mating(f11, f12, f22)
print(paste("Allele prob. next generation (p' and q') ", res$p_, res$q_))
## [1] "Allele prob. next generation (p' and q') 0.5 0.5"
print(paste("Genotype prob. next generation (f'11, f'12, f'22) ",res$f11_, res$f12_,res$f22_))
## [1] "Genotype prob. next generation (f'11, f'12, f'22) 0.25 0.5 0.25"
for (i in 1:25) {
result <- random_mating(f11, f12, f22)
f11 <- result$f11_
f12 <- result$f12_
f22 <- result$f22_
p[i] <- result$p
}
plot(generations, p, type = "l", xlab = "Generations", ylab = "Allele Frequency", main = "Change in Allele Frequency over time")
Modify the function random_mating() to compute the
genotype and allele frequencies for the case of a X-linked gene.
Hint: To consider an X-linked loci, you have to take into account that males are hemizygous for the locus, so simply count males as having only one allele for each frequency calculation.
P = f(AA) R = f(A)
H = f(Aa) S = f(a)
Q = f(aa)
Study the evolutionary dynamics: suppose that we have the extreme
initial case pm(0) = 1 and pf(0) = 0. Plot how
pm and pf changes over 25 generations. What
pattern do you observe?
What equilibrium is eventually reached?
# Define the random mating function with updated crossing probabilities
random_mating <- function(f11, f12, f22, f1, f2) {
p = f11 + f12/2 + f1/2 # allele frequency of X1
q = 1 - p # allele frequency of X2
# Mating pairs (frequencies) probabilities
all_prob = (f1 * f11) + (2 * f1 * f12) + (f1 * f22) + (2* f2 * f12) + (f2 * f22) + (f2 * f11)
f1_11 = f1 * f11 / all_prob
f1_12 = 2 * f1 * f12 / all_prob
f1_22 = f1 * f22 / all_prob
f2_12 = 2* f2 * f12 / all_prob
f2_22 = f2 * f22 / all_prob
f2_11 = f2 * f11 / all_prob
# Genotype probabilities of progeny (offspring) f11_, f12_, f22_
f11_ = f1_11/2 + f1_12/4
f12_ = f1_12/4 + f1_22/2 + f2_12/4 + f2_11/2
f22_ = f2_12/4 + f2_22/2
f1_ = f1_12/4 + f1_11/2 + f2_12/4 + f2_11/2
f2_ = f1_12/4 + f1_22/2 + f2_12/4 + f2_22/2
# Allele probability of progeny p_ and q_
p_ = f11_ + f12_/2 + f1_/2
q_ = 1 - p_
# Output
return (list("p" = p, "q" = q,
"f11" = f11, "f12" = f12, "f22" = f22,
"p_" = p_, "q_" = q_,
"f11_" = f11_, "f12_" = f12_, "f22_" = f22_, "f1_" = f1_, "f2_" = f2_))
}
# Initial frequencies
f11 <- 0
f12 <- 0
f22 <- 1
f1 <- 1
f2 <- 0
# Empty vectors to store allele frequencies over generations
p <- vector()
q <- vector()
# Calculate allele frequencies over 25 generations
for (i in 1:25) {
result_final2 <- random_mating(f11, f12, f22, f1, f2)
f11 <- result_final2$f11_
f12 <- result_final2$f12_
f22 <- result_final2$f22_
f1 <- result_final2$f1_
f2 <- result_final2$f2_
p[i] <- result_final2$p
q[i] <- result_final2$q
}
plot(generations, p, type = "l", col = "yellow", xlab = "Generation", ylab = "Allele Frequency", main = "Change in Allele Frequency of A (p) over Generations")
plot(generations, q, type = "l", col = "gray", xlab = "Generation", ylab = "Allele Frequency", main = "Change in Allele Frequency of a (q) over Generations")
Deliver this document in Aul@-ESCI with your answers
Deadline: 14 April 2024
Doubts? marta.coronado@csic.es and olga.dolgova@crg.es.